2_Sequence_bioinformatics
Master Medical Biometry/Biostatistics, Introduction to Bioinformatics, Medizinische Fakultät Heidelberg
The purpose of these exercises is to introduce you to common procedures in bionformatics using webtools using HIV as a case study.
Provide answers to questions that are marked with ‘Q’
Biological background
This is an animation describing in a simple manner the life cycle of the HIV virus and explains how the virus may be battled through inhibition of critical mechanisms.
Tools
For many of the following exercises we will make use of programs that are part of the EMBOSS package (The European Molecular Biology Open Source Software Suite).
Translation of nucleotide sequences
- Program:
sixpack- It translates a nucleotide sequence into its six possible reading frames - Data: gag_mrna.fa - mRNA of the GAG gene
Here we will identify possible peptides (or proteins) from the mRNA of the GAG gene.
- Locate the program and run using default parameters
- The program will create two output files in the same page:
outfileandoutseq - Look at the amino acid sequences for the first 120 nucleotides in
the sixpack
outfile
- The program will create two output files in the same page:
Q1. Which of the reading frames is likely to encode a protein? How could you tell?
Click here for answer
F1 is the most likely to be translated since there are
no stop codons that interrupt the sequence
This is the GAG protein:
>Gag_protein gi|2801504|gb|AAC82593.1| Gag [Human immunodeficiency virus 1]
MGARASVLSGGELDRWEKIRLRPGGKKKYKLKHIVWASRELERFAVNPGLLETSEGCRQILGQLQPSLQT
GSEELRSLYNTVATLYCVHQRIEIKDTKEALDKIEEEQNKSKKKAQQAAADTGHSNQVSQNYPIVQNIQG
QMVHQAISPRTLNAWVKVVEEKAFSPEVIPMFSALSEGATPQDLNTMLNTVGGHQAAMQMLKETINEEAA
EWDRVHPVHAGPIAPGQMREPRGSDIAGTTSTLQEQIGWMTNNPPIPVGEIYKRWIILGLNKIVRMYSPT
SILDIRQGPKEPFRDYVDRFYKTLRAEQASQEVKNWMTETLLVQNANPDCKTILKALGPAATLEEMMTAC
QGVGGPGHKARVLAEAMSQVTNSATIMMQRGNFRNQRKIVKCFNCGKEGHTARNCRAPRKKGCWKCGKEG
HQMKDCTERQANFLGKIWPSYKGRPGNFLQSRPEPTAPPEESFRSGVETTTPPQKQEPIDKELYPLTSLR
SLFGNDPSSQ
- Locate it in the sixpack
outseq
Q2. To which ORF does it match? How many other ORFs were predicted from the
GAG mRNA?
Click here for answer
It matches ORF1 and there are 142 predicted ORFS
BLAST and homologues to HIV proteins
Acquired Immune Deficiency Syndrome (AIDS) is caused by two closely related variants of Human Immunodeficiency Virus one (HIV-1) and Human Immunodeficiency Virus two (HIV-2). HIV-1 is responsible for the global pandemic, while HIV-2 has, until recently, been restricted to West Africa and appears to be less virulent in its effects. Viruses related to HIV have been found in many species of non-human primates (monkeys, apes, …) and have been named Simian Immunodeficiency Virus, SIV.
- Program: BLAST - Identifies homologous sequences (nucleotides, aminoacids) from a query sequence (nucleotide, aminoacid)
- Data: rev_prot.fa - this is the reverse transcriptase protein in HIV
Here we will identify similar proteins to the HIV reverse transcriptase in other organisms.
- Go to the BLAST webpage
- Select the correct “flavor” of BLAST
- Use the reverse transcriptase protein as query
- Select
UniProtKB/Swiss-Protas database - Click
BLAST
Once the analysis finishes * Go to the Graphic summary
tab * There you can see the different hits found and their corresponding
alignment score
Q3. What is the lowest alignment score? What is the description of the protein?
Click here for answer
Score = 44, with an e-value of 5.2e-06
that corresponds to Accession:P27971.1
RecName: Full=Protein Rev; AltName:
Full=Regulator of expression of viral proteins
[Simian immunodeficiency virus (AGM155 ISOLATE)]
This is not a reverse transcriptase,
but a protein that has the motif
- Go to the
Taxonomytab
Q4. Are you able to find REV proteins from human (HIV-2) and monkey (SIV)?
Click here for answer
5 SIVz sequences:SIVcpz MB66, EK505, GAB1, TAN1 and
Simian immunodeficiency virus (AGM155 ISOLATE)
no HIV2; however if we allow for more hits we will
find them as they are evolutionary related to HV1,
but quite distant
Multiple alignments - Origin of HIV
Program: Clustal Omega, the web interface of a multiple sequence alignment program
Data: env_prot.fa - file with 13 different protein sequences from isolates of HIV1, HIV2, chimpanzee (SIVCZ) and macaque monkey (SIVM1 and SIVML)
Go to the Clustal Omega webpage
- Upload the proteins file (or paste the sequences)
- Select
proteinas format - Run with default parameters
Once you have the results * Note the order of the sequences
- Select
Phylogenetic Tree- At the bottom of the page you will see the phylogenetic tree (evolutionary order) of your sequences
- Toggle between
Radialand not
Q5. What does this tree tell us about the phylogenetic relationship of HIV-1, HIV-2 and SIV?
Click here for answer
HV2 and SIVM are more similar while
HV1 and SIVCZ form a cluster
This hints us how HIV was transmitted
from monkeys to humans
Multiple alignments - HIV drug resistance
A number of drugs against HIV have been developed. One example is AZT which acts as an inhibitor to the reverse transcriptase (RT) encoded by the HIV genome. AZT binds to the active site of the RT and as a result blocks its polymerase activity. However, the mutation frequency of the HIV genome is very high, and resistance to AZT develops easily. This typically occurs by changing amino acids close to the active site so that the affinity for AZT is reduced.
Program: ClustalO, the web interface of a multiple sequence alignment program
Data: rt_isolates.fa - file contains amino acid sequences of the RT from AZT resistant as well as sensitive strains
Make a multiple alignment of the RT isolates
- Go to the
Tool outputtab - There are 2 mutations that are responsible for the resistance to the treatment
- Go to the
Q6. What are these positions and what are the amino acid changes? Hint: Find two positions that have been mutated in all the AZT resistant strains but not in the sensitive strain.
Click here for answer
67 N -> D
70 R -> K
HIV-1 RT structure
As a reminder here are the main treatment actions that are taken against HIV:
| Treatment | Visualization |
|---|---|
| Blockage of the entry to the host cell by fusion inhibitors | |
| Inhibition of reverse transcriptase by nulceoside inhibitors | |
| Inhibition of reverse transcriptase by non-nucleoside inhibitors | |
| Block of the integrase | |
| Inhibition of the protease |
Focusing on the Reverse Transcriptase (RT), let’s identify the key elements that are targeted to generate treatments against an HIV infection and understand how the HIV virus responds by creating resistance to these drugs.
- Program: iCN3D - Web-based 3D Structure Viewer
- Data: 1RTD - X-ray crystallography structure of HIV-1 RT in complex with DNA
If you have time:
- Open the structure link in a separate window
- This will take you to the
Structure databaseat NCBI - Focus on the
Molecular Graphicwindow - Click on
full feature 3D viewer
- This will take you to the
Under Sequences and Annotations, you will see all the
different molecules of 1RTD:
Two proteins: * chainA in light gray (1RTD_A). * chainB in yellow (1RTD_B).
Two nucleotide sequences: * the DNA-RNA complex, in blue and pink respectively (1RTD_E and 1RTD_F)
And some chemicals: * Four magnesium ions in green (1RTD_MG, 1RTD_MG2,1RTD_MG2 and 1RTD_MG4) that are needed to stabilize the structural conformation * A thyamine (T), that will be incorporated to the DNA by this machinery
Look at the structure from different angles by draging the mouse while pressing its left button.
Let’s clean the structure. Select 1RTD_A in the right
panel and click on:
Style -> Proteins -> Hide
Do the same with 1RTD_B. You now can easily see the
DNA-RNA complex together with the MG and T molecules.
Let’s put back chainA. Select 1RTD_A and then:
Style -> Protein -> Ribbon
You can see how the DNA-RNA complex is sitting along the protein
guiding it to incorporate the thyamine (T). Let’s focus on the “hand”.
Select 1RTD_A from the
Sequences and Annotations window, then click on:
Style -> Protein -> Hide
To highlight the “hand”:
Select -> Advanced
In the new window fill in with the following values:
Select: .A:1-324
Name: polymerase
Click on Save Selection to Defined Sets. To view our
defined sets, click Select -> Defined Sets, a new window
will appear with all the different molecules. Scroll down and select
polymerase, then:
Style -> Protein -> Ribbon
Color -> Unicolor -> Red
To highlight the “Thumb”, under the
Select -> Advanced, create a new selection:
Select: .A:245-324
Name: thumb
Click on Save Selection to Defined Sets. Under the
Select sets window, select thumb and then:
Color -> Unicolor -> Yellow
Let’s add some catalytical aspartates, that are critical for the
polymerase function. Under the Select -> Advanced,
create a new selection:
Select: .A:110,185,186
Name: aspartates
Click on Save Selection to Defined Sets. Under the
Select sets window, select aspartates and
then:
Style -> Protein -> Sphere
Color -> Unicolor -> Cyan
And finally, a key tyrosine that stabilizes the template-primer with
a hydrogen bond. Under the Select -> Advanced, create a
new selection:
Select: .A:183
Name: tyrosine
Click on Save Selection to Defined Sets. Under the
Select sets window, select tyrosine and
then:
Style -> Protein -> Sphere
Color -> Unicolor -> Gray
If you are short in time:
- Open the modified structure in a separate window
- Rotate the figure until you see the hand and the key elements (aspartates, tyrosine, magnesium ions) as in this figure
In a previous exercise, you aligned RT sequences from AZT resistant strains using Clustal Omega. You then identified two residues that are mutated in all three AZT resistant isolates.
- Highlight these positions in the structure
Select -> Advanced
- In the new window fill in with the following values (X and Y are the positions of the mutations):
Select: .A:X,Y
Name: mutations
- Click on
Save Selection to Defined Sets - Under the
Select setswindow, selectmutationsand then:
Style -> Protein -> Sphere
Color -> Unicolor -> A color of your choosing
| Description | Figure |
|---|---|
| Chain A + Chain B |
|
| Chain A + DNA/RNA duplex |
|
| DNA/RNA duplex + T + Mg ions |
|
| Hand in red |
|
| Thumb in yellow |
|
| Aspartates in cyan |
|
| Tyrosine in gray |
|
| Mutations in pink |
|
| Focusing on the hand and the DNA/RNA duplex |
|
| Rotating the structure |
|
| Focusing only on the hand |
|
Q7. How could these mutations interfere in the treatment of HIV?
Click here for answer
When these positions change,
the drug doesn't bind anymore and
it stops blocking the transcription of the virus